Bayesian network classifiers which perform well with continuous attributes: Flexible classifiers
نویسنده
چکیده
When modelling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous works have solved the problem by discretizing them with the consequent loss of information. Another common alternative assumes that the data are generated by a Gaussian distribution (parametric approach), such as conditional Gaussian networks, with the consequent error in the estimation if the true density differs from it. In order to break with the strong parametric assumption, this work introduces the conditional flexible network paradigm for supervised classification. This paradigm is a Bayesian network which estimates the true density of the continuous variables using kernels. Moreover, some of the most popular Bayesian multinomial network based classifier induction algorithms (naive Bayes, tree-augmented naive Bayes, k-dependence Bayesian classifier and Bayesian network-augmented naive Bayes) are adapted to the conditional flexible network paradigm. Besides their thresholded versions are introduced in order to avoid the compulsory addition of arcs between low correlated variables. The conditional flexible network can be seen as a generalization of the conditional Gaussian network because it allows a more flexible and precise estimation of the true densities. From the point of view of modelling correlations between predictor variables, the classifiers presented in this work can be seen as the natural extension of the flexible naive Bayes classifier proposed by John and Langley (1995) breaking with the naive Bayes independence assumption allowing dependencies between variables. Flexible tree-augmented naive Bayes seems to have superior behavior for the supervised classification among the flexible classifiers. Besides, flexible classifiers obtain quite competitive errors compared with the stateof-the-art classifiers.
منابع مشابه
Comparing Case-Based Bayesian Network and Recursive Bayesian Multi-Net Classifiers
Recent work in Bayesian classifiers has shown that a better and more flexible representation of domain knowledge results in more accurate classifiers. We have recently examined a new type of Bayesian classifiers called Case-Based Bayesian Network (CBBN) classifiers. The basic idea is to partition the training data into semantically sound clusters. A local BN classifier is then learned independe...
متن کاملPossibilistic classifiers for numerical data
Naive Bayesian Classifiers, which rely on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However, estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representat...
متن کاملVoting Massive Collections of Bayesian Network Classifiers for Data Streams
We present a new method for voting exponential (in the number of attributes) size sets of Bayesian classifiers in polynomial time with polynomial memory requirements. Training is linear in the number of instances in the dataset and can be performed incrementally. This allows the collection to learn from massive data streams. The method allows for flexibility in balancing computational complexit...
متن کاملFloating search algorithm for structure learning of Bayesian network classifiers
This paper presents a floating search approach for learning the network structure of Bayesian network classifiers. A Bayesian network classifier is used which in combination with the search algorithm allows simultaneous feature selection and determination of the structure of the classifier. The introduced search algorithm enables conditional exclusions of previously added attributes and/or arcs...
متن کاملConditional Log-Likelihood for Continuous Time Bayesian Network Classifiers
Continuous time Bayesian network classifiers are designed for analyzing multivariate streaming data when time duration of events matters. New continuous time Bayesian network classifiers are introduced while their conditional log-likelihood scoring function is developed. A learning algorithm, combining conditional log-likelihood with Bayesian parameter estimation is developed. Classification ac...
متن کامل